Discovering Factions in the Computational Linguistics Community
نویسندگان
چکیده
We present a joint probabilistic model of who cites whom in computational linguistics, and also of the words they use to do the citing. The model reveals latent factions, or groups of individuals whom we expect to collaborate more closely within their faction, cite within the faction using language distinct from citation outside the faction, and be largely understandable through the language used when cited from without. We conduct an exploratory data analysis on the ACL Anthology. We extend the model to reveal changes in some authors’ faction memberships over time.
منابع مشابه
Predicting Responses and Discovering Social Factors in Scientific Literature
We consider the problem of predicting measurable responses to scientific articles based primarily on their text content. Specifically, we consider papers in two fields (economics and computational linguistics) and make predictions about downloads and within-community citations. Our first two models investigate temporal and spatial aspects of scientific community’s interests. A third model which...
متن کاملData Mining Meets Collocations Discovery
In this paper we discuss the problem of discovering interesting word sequences in the light of two traditions: sequential pattern mining (from data mining) and collocations discovery (from computational linguistics). Smadja (1993) defines a collocation as “a recurrent combination of words that cooccur more often than chance and that correspond to arbitrary word usages.” The notion of arbitrarin...
متن کاملAttitudes in Iranian vs. Western Media Coverage of the Iranian Nuclear Issue
Employing the appraisal framework in discovering the way ideology is crystalized through discourse, the present study attempts to investigate how journalistic ideologies and political positions are manifested through attitudinal terms. Referring to White’s (2012) distinction of attitude types, inscribed vs. invoked, based on Martin and White’s (2005) appraisal theory, journalistic ideology toge...
متن کاملDiscovering Parallel Text from the World Wide Web
Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and multilingual text mining. Constructing a parallel corpus requires effective alignment of parallel documents. In this paper, we develop a parallel page identification system for identifying and aligning parallel documents ...
متن کاملArabic Rhetorical Relations Extraction for Answering "Why" and "How to" Questions
In the current study we aim at exploiting discourse structure of Arabic text to automatically finding answers to non-factoid questions ("Why" and "How to"). Our method is based on Rhetorical Structure Theory (RST) that many studies have shown to be a very effective approach for many computational linguistics applications such as (text generation, text summarization and machine translation). For...
متن کامل